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Corynebacterium terpenotabidum Takeuchi et. al 1999 is a member of the genus 
Corynebacterium, which contains Gram-positive and non-spore forming bacteria with a 
high G+C content. C. terpenotabidum was isolated from soil based on its ability to degrade 
squalene and belongs to the aerobic and non-hemolytic Cory neb acteria. It displays toler- 
ance to salts (up to 8%) and is related to Corynebacterium variabile involved in cheese rip- 
ening. As this is a type strain of Corynebacterium, this project describing the 2.75 Mbp 
long chromosome with its 2,369 protein-coding and 72 RNA genes will aid the Genomic 
Encyclopedia of Bacteria and Archaea project. 



Introduction 

Strain Y-11 T (= DSM 444721T) is the type strain of 
the species Corynebacterium terpenotabidum [1]. 
It was originally isolated from soil, although the 
exact source has not been published [2,3]. The ge- 
nus Corynebacterium is comprised of Gram- 
positive bacteria with a high G+C content. It cur- 
rently contains over 80 members [4] isolated from 
diverse backgrounds like human clinical samples 
[5] and animals [6], but also from soil [7] and rip- 
ening cheese [8]. 

Within this diverse genus, C. terpenotabidum has 
been proposed to form a subclade together with C. 
variabile DSM 20132 T and C. nuruki S6-4 T , demon- 
strating 97.4% and 95.9% similarity respectively 
between the 16S rRNA gene sequences. Infor- 
mation on the strain is scarce. It was isolated for 
its ability to metabolize the linear triterpene 
squalene and classified as an Arthrobacter species 
[2,3], but no further information on the strain was 
supplied. Neither the origin nor the exact isolation 
procedures were reported. C. terpenotabidum can 
cleave squalene yielding geranylacetone [2] but 
also accepts some squalene derivatives [3]. 

Here we present a summary classification and a 
set of features for C. terpenotabidum DSM 44721 T , 



together with the description of the genomic se- 
quencing and annotation. 

Classification and features 

A representative genomic 16S rRNA sequence of C. 
terpenotabidum DSM 44721 T was compared to the 
Ribosomal Database Project database [9]. C. 
terpenotabidum shows highest similarity to C. 
variabile (97.4%). 

Figure 1 shows the phylogenetic neighborhood of 
C. terpenotabidum in a 16S rRNA based tree. With- 
in the genus Corynebacterium, C. terpenotabidum 
forms a distinct subclade together with C. variabile 
and C. nuruki. 

C. terpenotabidum Y-11 T cells are Gram-positive 
non acid fast rods (1.0-1.5 |im x 0.5-0.8 |im wide) 
that grow strictly aerobically in rough, grayish- 
white colonies without diffusible pigments or aer- 
ial mycelia [1], [Table 1]. Cells grow with a wax- 
like quality on solid medium and tend to clot in 
liquid culture. Scanning electron micrograph pic- 
tures of liquid grown cultures revealed slight 
morphological differences between free-floating 
cells and clotted cells (Figure 2). 
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— Corynebacterium diphtheriae NCTC 11397 T (X84248) 
■Corynebacterium pseudotuberculosis CIP 102968 1 (X81916) 

Corynebacterium vitaeruminis NCTC 20294 T (X84680) 

Corynebacterium aquiiae CECT 5993 T (AJ496733) 

■Corynebacterium argentoratense CIP 104296 T (X83955) 



■Corynebacterium felinum CCUG39943 T (AJ401282) 

■Corynebacterium renale CIP 103421 T (X81909) 



■Corynebacterium singulare CCUG 37330 T (Y10999) 



-Corynebacterium aurimucosum IMMIB D-1488 7 (AJ309207) 
-Corynebacterium striatum NCTC 764 T (X84442) 

■Corynebacterium flavescens NCDO 1320 (X84441) 

Corynebacterium thomssenii DSM 44276 T (AF010474) 

■Corynebacterium case; LMG S-19264 1 (AF267152) 



Corynebacterium ammoniagenes CIP 101283 7 (X84440) 

-Corynebacterium pilosum ATCC 29592 T (X81908) 



Corynebacterium halotolerans YIM70093 1 (AY2265Q9) 

Corynebacterium maris Coryn-1 T (FJ423600) 



-Corynebacterium humireducens MFC-5 T {GQ421281 ) 

Corynebacterium marinum 701 5 T (DQ219354) 

Corynebacterium efftciens YS-314 7 (AB055963) 



Corynebacterium glutamicum ATCC 13032 T (AF314192) 
- Corynebacterium callunae NCFB 10338 T (X84251) 

Corynebacterium xerosis ATCC 373 T (X81914) 



™v Corynebacterium freneyi 206951 10 T (AJ292762) 
■Corynebacterium terpenotabidum IFO 14764 T (AB004730) 



J - "" Corynebacterium variabile DSM 201 32 T (AJ222815) 

1 Corynebacterium nuruki S6-4 T (HM165487) 

Rhodococcus equi DSM 20307 T (X80614) 



Figure 1. Phylogenetic tree highlighting the position of C. terpenotabidum relative to type strains of other species 
within the genus Corynebacterium. Species with at least one publicly available genome sequence (not necessarily 
the type strain) are highlighted in bold face. The tree is based on sequences aligned by the RDP aligner and utilizes 
the Jukes-Cantor corrected distance model to construct a distance matrix based on alignment model positions with- 
out alignment inserts, using a minimum comparable position of 200. The tree is built with RDP Tree Builder, which 
utilizes the Weighbor method [1 0] with an alphabet size of 4 and length size of 1,000. The building of the tree also 
involves a bootstrapping process repeated 100 times to generate a majority consensus tree [11]. Rhodococcus equi 
(X80614) was used as an outgroup. 
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Table 1. Classification and general features of C. terpenotabidum Y-1 1 T according to the MIGS recommen- 
dations [12]. 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [13] 






P hy I u m Ac tin oba cte ria 


TAS [14] 






Class Actinobacteria 


TAS [15] 






Order Actinomycetales 


TAS [15-18] 




Current classification 


Family Corynebacteriaceae 
Genus Corynebactenum 

Species Cory nebacterium terpenotabidum 
Type-strain Y-1 1 (=DSM 4472 1) 


IAS [15-17,19] 

TAS [15- 
17,20,21] 

TAS [1] 
TAS [1 ] 




Gram stain 


positive 


TAS [1 ] 




Cell shape 


rod-shaped 


TAS [1] 




Motility 


non-motile 


TAS [1] 




Sporulation 


non-sporulating 


TAS [1] 




Temperature range 


mesophile 


TAS [1] 




Optimum temperature 


28°C 


TAS [1 ] 




Salinity 


0-8% (w/v) NaCI 


TAS [1] 


MIGS-22 


Oxygen requirement 


aerobe 


TAS [1 ] 




Carbon source 


C 1 1 ... 1 1 

fructose, galactose, mannose, lactate, ethanol 


TAS [1 ] 




Energy metabolism 


chemoorganoheterotrophic 


NAS 




Terminal electron acceptor 


oxygen 


NAS 


MIGS -6 


Habitat 


soil 


TAS [2] 


MIGS-15 


Biotic relationship 


free-living 


NAS 


MIGS-14 


Pathogenicity 


non-pathogenic 


NAS 




Biosafety level 


1 


NAS 


MIGS-23.1 


Isolation 


not reported 




MIGS-4 


Geographic location 


not reported 




MIGS-5 


Sample collection time 


not reported 




MIGS -4.1 


Latitude 


not reported 




MIGS -4.2 


Longitude 






MIGS -4.3 


Depth 


not reported 




MIGS -4.4 


Altitude 


not reported 





a) Evidence codes - TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non- 
traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a gener- 
ally accepted property for the species, or anecdotal evidence). These evidence codes are from of the Gene 
Ontology project [22]. 



C. terpenotabidum was found to be able to utilize 
fructose, galactose, mannose, lactate, and ethanol 
as carbon source, while many others like arginine, 
aspartate, histidine, methylamine, ethylamine, 
methanol, galactose, lactose, maltose, sucrose, 
glycerol, sorbitol, mannitol, inositol, citrate, suc- 
cinate, malonate, pimelate, m-hydroxybenzoate 
and p-hydroxybenzoate cannot be used. Optimal 
growth of strain Y-11 T is reported at 28°C. C. 
terpenotabidum was shown to grow with a salinity 



between 0 and 8.0% (w/v NaCI), with no growth 
at 10% [1]. The biochemical characterization re- 
vealed positive signals for urease, catalase, and 
hydrolysis of Tween 80. 

Chemotaxonomy 

The cell wall of C. terpenotabidum Y-11 T contains 
alanine, glutamic acid, and meso-diaminopimelic 
acid in a molar ratio of 2.12: 1.00: 0.97. The main 
components of the cell wall sugars are described 
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to be arabinose, galactose, and mannose in a mo- 
lar ratio of 2.47: 1.71: 1.00. The glycan moiety of 
the cell wall was found to contain acetyl residues 
[1]- 

In C. terpenotabidum, cellular fatty acids are com- 
posed mainly of oleic acid (Ci8:i<x>9c, 31%), 
palmitic acid (Ci 6: o, 28%), and tuberculostearic 
acid 10-methyl (Ci 8: o, 21%). The whole-cell 
methanolysate of strain Y-ll contained mycolic 
esters [1]. The predominant isoprenoid quinone is 
menaquinone MK-9(H2). 



Genome sequencing and annotation 

Genome project history 

C. terpenotabidum Y-11 T was selected for sequencing 
as part of a project to define the core genome and 
pan genome of the no n- pathogenic corynebacteria. 
While not being part of the Genomic Encyclopedia of 
Bacteria and Archaea (GEBA) project [23], sequenc- 
ing of the type strain will nonetheless aid the GEBA 
effort. The genome project is deposited in the Ge- 
nomes OnLine Database [24] and the complete ge- 
nome sequence is deposited in GenBank Sequenc- 
ing, finishing and annotation were performed by the 
Center of Biotechnology (CeBiTec). A summary of 
the project information is shown inTable 2. 



Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MIGS-31 


Finishing quality 


Finished 


MIGS-28 


Libraries used 


Two genomic libraries: one 454 pyrosequencing PE library 
(3.4 kb insert sizes), one lllumina library 


MIGS-29 


Sequencing platforms 


454 GS FLX Titanium, lllumina MiSeq 


MIGS-31. 2 


Sequencing coverage 


29.52 x Pyrosequencing; 61.71 x SBS 


MIGS-30 


Assemblers 


Newbler version 2.3 


MIGS-32 


Gene calling method 


GeneMark, Glimmer 




INSDC ID 


CP003696 




GenBank Date of Release 


September 1, 2013 / after publication 




GOLD ID 


GM8852 




NCBI project ID 


16861 7 


MIGS-13 


Source material identifier 


DSM 44721 




Project relevance 


Industrial, GEBA 



Growth conditions and DNA isolation 

C. terpenotabidum strain Y-11 T , DSM 44721, was 
grown aerobically in LB broth (Carl Roth GmbH, 
Karlsruhe, Germany) at 30 °C. DNA was isolated 
from ~ 10 8 cells using the protocol described by 
Tauch eta/. 1995 [25]. 

Genome sequencing and assembly 

The genome was sequenced using a 454 sequenc- 
ing platform. A standard 3k paired end sequencing 
library was prepared according to the manufac- 
turers protocol (Roche). The genome was se- 
quenced using the GS-FLX platform with Titanium 
chemistry, yielding 384,252 total reads, providing 
29.52x coverage of the genome. Pyrosequencing 
reads were assembled using the Newbler assem- 
bler v2.3 (Roche). The initial Newbler assembly 
consisted of 22 contigs in six scaffolds. Analysis of 
the six scaffolds revealed five that made up the 



chromosome, while the remaining one contained 
five copies of the RRN operon that caused the scaf- 
fold breaks. The scaffolds were ordered based on 
alignments to the complete genomes of C. variabile 
[26] and subsequent verification by restriction 
digestion, Southern blotting and hybridization 
with a 16S rDNA specific probe. 

The Phred/Phrap/Consed software package [27- 
30] was used for sequence assembly and quality 
assessment in the subsequent finishing process. 
After the shotgun stage, gaps between contigs 
were closed by editing in Consed (for repetitive 
elements) and by PCR with subsequent Sanger 
sequencing (IIT Biotech GmbH, Bielefeld, Germa- 
ny). A total of 12 additional reactions were neces- 
sary to close gaps not caused by repetitive ele- 
ments. 
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To raise the quality of the assembled sequence, 
Illumina reads were used to correct potential base 
errors and increase consensus quality. A WGS li- 
brary was prepared using the Illumina- 
Compatible Nextera DNA Sample Prep Kit (Epicen- 
tre, WI, U.S.A) according to the manufacturer's 
protocol. The library was sequenced in a 2x 120 
bp paired read run on the MiSeq platform, yielding 
2,307,926 total reads. Together, the combination 
of the Illumina and 454 sequencing platforms pro- 
vided 91. 2* coverage of the genome. 

Genome annotation 

Gene prediction and annotation were done using 
the PGAAP pipeline [31]. Genes were identified 
using GeneMark [32], GLIMMER [33], and Prodigal 
[34]. For annotation, BLAST searches against the 
NCBI Protein Clusters Database [35] are per- 
formed and the annotation is enriched by searches 
against the Conserved Domain Database [36] and 
subsequent assignment of coding sequences to 



COGs. Non-coding genes and miscellaneous fea- 
tures were predicted using tRNAscan-SE [37], In- 
fernal [38], RNAMMer [39], Rfam [40], TMHMM 
[41], and SignalP [42]. 

Genome properties 

The genome consists of one circular chromosome 
of 2,751,233 bp (67.02% G+C content) with no 
additional extrachromosomal elements present. A 
total of 2,441 genes were predicted, 2,369 of 
which are protein coding genes. 1,306 (55.13%) of 
the protein coding genes were assigned to a puta- 
tive function with the remaining annotated as hy- 
pothetical proteins. In addition, 910 protein cod- 
ing genes belong to 281 paralogous families in this 
genome, corresponding to a gene content redun- 
dancy of 38.41% [Figure 3]. The properties and 
the statistics of the genome are summarized in 
Table 3, and Table 4. 



Table 3. Genome Statistics 



Attribute 


Value 


% of total 3 


Genome size (bp) 


2,751,233 


100.00 


DNA coding region (bp) 


2,441,394 


88.74 


DNA G+C content (bp) 


1,843,810 


67.02 


Total genes 


2,441 


100.00 


RNA genes 


72 


2.96 


rRNA operons 


5 




tRNA genes 


57 


2.34 


Protein-coding genes 


2,369 


97.04 


Genes with function prediction (protein) 


1,306 


55.13 


Genes assigned to COGs 


1,812 


74.23 


Genes in paralog clusters 


910 


38.41 


Genes with signal peptides 


224 


9.54 


Genes with transmembrane helices 


606 


25.58 


a) The total is based on either the size of the ; 


genome in base 


pairs or the total 



number of genes in the annotated genome. 
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Figure 3. Graphical map of the chromosome. From the outside in: Genes on forward strand (colored according to 
COG categories), Genes on reverse strand (colored according to COG categories), GC content, GC skew. 
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Table 4. Number of genes associated with the general COG functional categories 



Code 


Value 


%age 


Description 


J 


151 


6.37 


Translation, ribosomal structure and biogenesis 


A 


1 


0.04 


RNA processing and modification 


K 


152 


6.42 


Transcription 


L 


136 


5.74 


Replication, recombination and repair 


B 


0 


0.00 


Chromatin structure and dynamics 


D 


20 


0.84 


Cell cycle control, cell division, chromosome partitioning 


Y 


0 


0.00 


Nuclear structure 


V 


32 


1.35 


Defense mechanisms 


T 


58 


2.45 


Signal transduction mechanisms 


M 


81 


3.42 


Cell wall/membrane biogenesis 


N 


1 


0.04 


Cell motility 


Z 


0 


0.00 


Cytoskeleton 


W 


0 


0.00 


Extracellular structures 


u 


2 6 


1.10 


I ntracel lular traff icki ng and secretion, and vesicular transport 


o 


72 


3.04 


Posttranslational modification, protein turnover, chaperones 


c 


127 


5.36 


Energy production and conversion 


G 


115 


4.85 


Carbohydrate transport and metabolism 


E 


218 


9.20 


Amino acid transport and metabolism 


F 


68 


2.87 


Nucleotide transport and metabolism 


H 


97 


4.09 


Coenzyme transport and metabolism 


1 


121 


5.11 


Lipid transport and metabolism 


P 


151 


6.37 


Inorganic ion transport and metabolism 


Q 


76 


3.21 


Secondary metabolites biosynthesis, transport and catabolism 


R 


2 74 


11.57 


General function prediction only 


S 


138 


5.83 


Function unknown 




557 


23.51 


Not in COGs 
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